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Sequential decoding for lossless streaming 
source coding with side information 

Hari Palaiyanur, Student Member, Anant Sahai, Member 

Abstract 

The problem of lossless fixed-rate streaming coding of discrete memoryless sources with side information at the 
decoder is studied. A random time-varying tree-code is used to sequentially bin strings and a Stack Algorithm with 
a variable bias uses the side information to give a delay-universal coding system for lossless source coding with 
side information. The scheme is shown to give exponentially decaying probability of error with delay, with exponent 
equal to Gallager's random coding exponent for sources with side information. The mean of the random variable 
of computation for the stack decoder is bounded, and conditions on the bias are given to guarantee a finite p th 
moment for < p < 1. Further, the problem is also studied in the case where there is a discrete memoryless channel 
between encoder and decoder. The same scheme is slightly modified to give a joint-source channel encoder and Stack 
Algorithm-based sequential decoder using side information. Again, by a suitable choice of bias, the probability of 
error decays exponentially with delay and the random variable of computation has a finite mean. Simulation results 
for several examples are given. 

Index Terms 

Data compression, side information, joint source-channel coding, sequential decoding, lossless source coding, 
Slepian-Wolf, error exponent, delay universal, stack algorithm, random variable of computation 

I. Introduction 

In this paper, we consider the problem of lossless source coding with side information shown in Figure Q] The 
seminal paper of Slepian and Wolf [1] was the first to give the achievable rate region for this problem, when the 
source consists of a pair of dependent random variables that are independent and identically distributed (IID) over 
time. A sequence of IID symbols is encoded and its compressed representation is given noiselessly to a decoder. The 
decoder also has access to side information that is correlated in a known way with the source. The side information 
generally permits the source to be compressed to a rate below its entropy and still recovered losslessly. If the 
source is U and the side information V, then [1] showed that the conditional entropy, H(U\V), is a sufficient rate 
to recover the U with arbitrarily low probability of error. 
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The currently known, robust methods of compression used in point-to-point lossless source coding generally 
employ variable length codes. Solutions such as Lempel-Ziv coding ([2], [3], [4]) and context-tree weighting [5] 
are also capable of efficiently compressing many sources with memory. Recently, these algorithms have been adapted 
to the 'compression with side information' problem when the side information is available to both the encoder and 
decoder. Cai, et al. [6] have shown how to modify the context-tree method to account for side information at the 
encoder. It is also possible to modify the Lempel-Ziv algorithms to account for side information at the encoder 
([7], [8]). 

The purpose of this paper, however, is to consider how to compress when the side information is available to 
the decoder only. This restriction disallows variable length codes as a generic solution. Variable length codes work 
because they assign short codewords to typical source strings and longer codewords to atypical strings. When 
the side information is available only to the decoder, the encoder cannot tell when the joint source is behaving 
atypically. As an example consider a binary equiprobable source U. Let V be the output of U passed through a 
binary symmetric channel with crossover probability 1/10. Every U source string of the same length occurs with 
equal probability, but clearly the side information allows the source to be compressed below 1 bit per symbol. 

One approach around this problem is to use block codes such as LDPC codes to give a 'structured' binning 
of the source strings. The side information is then used at the decoder to distinguish amongst the source strings 
in the received bin. In the same mold, it is also possible to use turbo-codes as done by Aaron, Girod, et.al. ([9], 
[10]). Regardless of the type of code, lack of the side information at the encoder somehow necessitates a shift in 
'complexity' from the encoder to the decoder. 

The idea of shifting complexity from encoder to decoder in lossless source coding is not new. In [11], Hellman 
suggested using convolutional codes for joint source-channel coding in applications such as deep-space communi- 
cations where computational effort at the encoder comes at a premium. Around the same time, papers of Koshelev 
[12] and Blizard [13] suggested using convolutional codes in conjunction with sequential decoders for the purposes 
of data compression and joint source-channel coding. These ideas extend naturally to the subject of this paper, 
lossless source coding and joint source-channel coding with side information available to the decoder only. 

The approach of this paper is to use random, time-varying, infinite constraint length convolutional codes to 
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Fig. 1. Source coding with side information at rate R bits per time unit. 
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sequentially 'bin' an IID source and a Stack Algorithm sequential decoder to (almost) losslessly recover it. The 
decoder has a variable 'bias' parameter, as in [14] by Jelinek, that allows for a tradeoff between probability of error 
and moments of the random variable of computation associated with the sequential decoder. The proof techniques 
are adaptations to source coding and joint source-channel coding of those of [14]. 

Table [Q shows the relation of this paper with some prior work. There are several lines of work in information 
theory that our scheme is related to. As already mentioned, the main point of this paper is to extend the idea 
of using convolutional encoding with sequential decoding for lossless source coding by modifying the decoder to 
allow the use of side information. 

In [12], Koshelev shows that there is a point-to-point source coding 'cutoff rate' for a stack-based sequential 
decoding algorithm. That is, if the rate is larger than the cutoff rate, then the expected mean of computation 
performed by the sequential decoder is finite. Work in the opposite direction by Arikan and Merhav [28] showed 
that this cutoff rate is tight; if the rate is below the cutoff rate, the expected mean in computation is infinite. 
Furthermore, [28] gives a lower bound to the cutoff rate for all moments of the random variable of computation, 
not only the mean. Our result regarding computation parallels Koshelev's, only with side information allowed at 
the decoder. We give an upper bound to the 'cutoff rate' for moments in the interval [0, 1], of sequential decoding 
for lossless source coding with side information at the decoder. When the side information is independent of the 
source to be recovered, reducing to the point-to-point version of the problem, this cutoff rate coincides with that 
of [28]. 

One interesting aspect of our scheme is its 'anytime' or delay-universal nature. By using an infinite constraint- 
length convolutional code, it is possible to have a probability of error that goes to zero exponentially with delay 3- 

'Delay is defined as the difference between the decoding time and the time the symbol entered the encoder. 
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For certain problems in distributed control ([29], [30]), an exponentially decreasing probability of error is required 
to guarantee plant stability (in a moment sense). For these problems, the error exponent with delay determines 
the moments of the plant state that can be stabilized. The scheme presented in this paper, if there is a channel 
between encoder and the side-information aided decoder, achieves an error exponent with delay analogous to the 
point-to-point random block coding error exponent of Problem 5.16 of Gallager [16]. Recent work by Chang, et. 
al. ([31], [22], [23], [32]) has shown that in general, the best block error exponents are much lower than the best 
error exponents with delay achievable for problems of lossless source coding with and without side information. 

The paper is organized as follows. In Section Hl-AI we set up the problem of streaming source coding, perhaps 
with a noisy channel between encoder and decoder, and with side information available to the decoder. Then in 
Section Hl-BI we give a description of the encoder and decoder. Next, in Section [ill] we state the two main theorems 
One theorem states the error exponent with delay for this scheme and the other theorem gives an upper bound to 
the asymptotic distribution of computation when using the stack algorithm. The next section gives some examples 
and simulation results showing that the proposed scheme can be implemented with non-prohibitive complexity. In 
the conclusion, we discuss some open questions left in this specific line of work and some future directions. Finally, 
in the appendix, we give proofs of the theorems of the text. 

II. Sequential data compression with side information 

A. Problem definition 

The source is modelled as a sequence of LTD random variables (Ui, Vi), i > 1, that take on values from a finite 
alphabet UxV. Each (Ui,Vi) is drawn according to a probability mass function Q(u, v). With some abuse of 
notation, we will use Q(u), for u 6 U, to denote the marginal probability J2 v ev Q( u ' v )- Similarly, Q(v) will 
be the marginal Y^ueu Q( u > v ) f° r v e ^- Finally, Q(u\v) — Q(u,v)/Q(v) for (u,v) £ U x V. Without loss of 
generality, assume Q(v) > 0, V v G V. If U and V are independent, the point-to-point source coding problem is 
recovered. 

Our goal is to code the Ui symbols causally into a fixed rate bit stream so that the symbols can be recovered 
losslessly by a decoder in the sense that a symbol Ui is recovered with probability 1 in the limit of large decoding 
delay. For reasons mentioned in the introduction, a truly fixed rate coding strategy that assigns the same number 
of bits to sequences of the same length will be pursued. 

Figure [2] shows the setup of our 'streaming' source coding problem. At a discrete time instant n, the encoder 
has access to the source realization up through time n, which is denoted^ u™. Let the rate of the encoder be R bits 
per source symbol. The encoder at time n outputs \nR\ — [(n — l)R\ bits that are a function of u". Based on the 
bits B-[ nR ^ and the side information v™, the decoder at time n gives its estimate of the source symbols up through 

2 We will use z? to denote the vector (zi, z;+i, . . . , zj) if i < j and the null string if i > j. 
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Fig. 2. Sequential source coding with side-information: rate R = i. 



time n, denoted as u"(n). 

£„ : W n -> {0, i}Lt»«J-L<~-i)-RJ (i) 

= W) (2) 
P„ : {0,l} L " flJ x V" -> W" (3) 
ul(n) = 2?„(B|" flJ ,<) (4) 

The only interesting values of R lie in the interval [H(U\V), log 2 (|W|)] since we need a rate of at least the 
conditional entropy to losslessly encode the source, and if R > log 2 (|W|), we could just index the source sequences 
on a per-letter basis and losslessly recover them with no delay. 

H(U\V)±^Q(v)J2Q(u\v)log 2 — l — (5) 

There are two measures of performance that we will evaluate. First is the tradeoff between probability of error 
and delay. 

Definition 1: The probability of error with delay d, P e (d), is 

P e {d) 4 sup P{u^(n + d) ^u?) (6) 

n 

This probability is taken over the randomness in the source and any randomness that may be present in the encoder 
or decoder. The error exponent with delay, or reliability exponent E(R) at the rate R where the encoder/decoder 
operates is 

E{R) 4 liminf ~ log 2 P e (d) (7) 

The second measure of performance lies in the random variable of computation. The motivation for developing 
sequential decoders has always been the opportunity to have a 'nearly optimal' decoder without exponentially 
growing complexity in block length or delay [33]. The amount of computation performed by our source decoder 
will be measured in the number of source sequences that are considered or compared against others. 
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Definition 2: If u n is the true source realization at time n > 1, the i incorrect subtree, C{, is 

C t = {zl G : n > i, zj~ 1 = wf 1 and z t j= u,} (8) 

The i th random variable of computation, Ni, is the number of nodes in d that are ever examined by the decoder. 

The definition of Ni is a bit vague for arbitrary decoders but becomes concrete for sequential decoders, because 
the defining property of sequential decoders is essentially that they examine paths in a tree or trellis structure one 
by one|^| 

B. A random binning scheme with a stack decoder 

In this section, the encoder and decoder for the coding strategy of this paper is described. The encoder used 
is similar to the encoder used in the sequential source coding paper of [31]. The bit sequence is arrived at by 
the use of a random tree code, which can be implemented using a time-varying, infinite constraint length, random 
convolutional code. Figure [3] shows an example of such a code. We first envisage a uniform tree with \U\ branches 
emanating from each node. The branches are numbered 1, 2, . . . , to denote the extension of the parent sequence 
by one symbol from U. Hence, for all k > 1 there is a one-to-one correspondence between |ZY|-ary strings of length 
k and nodes in the tree. These properties make clear that labelling the branches of the tree with an appropriate 
number of code bits would yield a tree encoding of the source: a sequential source code. 

The sequential random binning scheme we use is an ensemble of tree codes, with every bit on every branch 
drawn identically and independently as Bernoulli (1/2, 1/2) (6(1/2)) random variables. This means that if source 
sequences u" and z™ are the same until time n — d + 1, i.e. = but u„_,j+i ^ z n -d+i, the probability 

that u™ and z™ are placed in the same 'bin' is 2~ dR . This is because the last dR bits of the codewords for u™ and 
z™ are drawn IID B(l/2). We refer to the bits in the codewords of source sequences as 'parities' because we think 
of them as coming from a time-varying, infinite constraint length convolutional code. 

Decoding will be done by a stack algorithm, and hence is also sequential. For explanations of the stack algorithm, 
refer to [34], [35], or [36]. The following is the specific stack algorithm used. We initialize the stack with the root 
node having a metric of 0. 

1) Let u\ denote the (partial) source sequence at the top of the stack. Remove u\ from the stack and consider 
each of its \U\ extensions by one symbol from U, i.e. {u\,u), Vit G X. Let u 1 ^ 1 be one of these extensions, 
and do the following for each of the extensions. If the parities of u 1 ^ 1 match the parities received, update 
the metric of u 1 -^ 1 and add it to the stack in a sorted way (highest metric on top). Otherwise discard Ctj^.Q 

3 There is also some amount of 'internal' computation the decoder must do to determine the codewords of the source sequences. We assume an 
oracle gives the decoder any source codeword it wants at unit cost. This is somewhat significant in our random convolutional code implementation 
since the encoder's output depends on all previous source symbols. This means that as time increases, there is an increasing complexity to 
determining the bits assigned to a source symbol. 

4 Note that the parities of u 1 -^ 1 will match those received if and only if the label on the branch extending u\ to u 1 ^ 1 matches the R parities 
received in the last time step. 
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2) Let u\ denote the sequence on top of the stack after all the relevant extensions have been added. If the length 
of U]_, k, is up to the current time, declare u\ as the decoded source sequence so far. Otherwise repeat 1. 
The metrics are updated in an additive manner, with the metric of u 1 ^ 1 being r(w' 1 +1 ) = T(u[) + T(ui + i). For 
(u, v) 6 U x V, the metric for the source symbol u given side information v is 

T(u)^G + log 2 (Q(u\v)) 

The parameter G is the 'bias' and controls to a large extent the amount of searching through the tree the algorithm 
performs. The bias is used as a normalizer so that the true path through the tree has a metric that is slowly increasing 
in time, while false path metrics are dropped to — oo by non-matching parities. 

C. Joint source-channel coding with side information 

Suppose there is a DMC between the encoder and the decoder. Let W be its probability transition matrix, from 
a finite input alphabet X to a finite output alphabet y. Assume there are A > channel uses per source symbol. 

£ n : VT -> AfL«AJ-L(n-i)AJ (9 ) 

4"n-l)AjK) = < 10 > 

V n : y lnXi x V™ -> U n (11) 
«?(n) = V n (y[ nXi ,v?) (12) 

The random binning encoder and the stack decoder of the previous section changes only slightly. First, the 
encoding tree is restricted to having one channel symbol on each branch, rather than R bits. We will assume 
each channel input on the tree is drawn IID from a distribution /3(x) on X. Secondly, the stack decoder cannot 
discard paths based on parities anymore. So, if w,v,x A , and are respectively the source symbol on a branch, 
side information symbol, channel inputs on the branch and the channel outputs received by the decoder, then the 
decoder assigns a metric of: 

r(„) = G + i„ g3 «<!*»i> 

where P(y) = J2xex P{x)W{y\x) and P(yi) = Yli=i P(Vk)- The performance measures remain the same, 
with the error exponent at 'rate' A being E(X) = liminf^oo — ^ log 2 P e (d). 
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Fig. 3. An example of a tree code for a source with ternary alphabet. Here the rate R is one bit per source symbol. 
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E (p) 4 -log a 
F(p) = -log a 
G(p) = - log 2 



III. Main Results 

A. Functions of interest 

We start with the definition of some functions that appear in the theorem statements. The following functions of 
the channel input distribution, j3(x), and channel transition probability matrix, W(y\x), appear in [14]. 

■yey xex u> 

■yey xex yj > 

■yey xex u > 

We define the following functions of the source distribution for p > 0. E S i(p) can be found in [18] and the 
others are modifications of E si . 

i+pl 



(13) 
(14) 
(15) 



E sl (p) = log 2 
F st {p) = log 2 
G st {p) = log 2 



E^^E^M")^ 

vev ueu 

E^)(E^h^ 



(16) 
(17) 
(18) 



-vev ueu 

If the side information is independent of U, then we get the simpler functions E s (p), F s (p) and G s (p) below. 



E s (p) 4 (l + p)log 2 
F s {p) = plog 2 
G 5 (p) = log 2 



L new 



(19) 
(20) 
(21) 



B. Probability of error with delay 

Theorem 1 (Error exponent with delay for source coding with side information): Suppose that the decoder 
has access to the side information and there is a noiseless rate R binary channel between the encoder and decoder. 
Fix any e > and let p <E [0, 1]. For the encoder/decoder of Section Hl-BI if the bias G satisfies 

1 + P 



G < 



E si {p) - F si (p) 



then there is a constant K t < oo so that 

P e (d) < K e exp 2 ^ - d(pR - E st (p) - ej 

Hence, with suitable choice of bias, the error exponent with delay can approach 

E(R) — E r , si (R) = sup pR-E si (p) 
pe[Q,i] 



(22) 



(23) 



(24) 
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If the side information is independent of the source U, then E S i, F S i and G S i simplify to E s , F s and G s 
respectively. So, in the case of straight point-to-point lossless source coding, we arrive at an source coding equivalent 
of Gallager's random coding exponent: 

E r>pp (R) 4 max pR - E s (p) (25) 
pe[o,i] 

Theorem 2 (Error exponent with delay for joint source-channel coding with side information): Suppose there 
is a channel W between the encoder and the decoder and side information is available to the decoder. Fix any 
e > and let p e [0, 1]. For the encoder/decoder of Sections IH-BI and IH-CI if the bias G satisfies 

1 + P 



G < 

P 

then, there is a constant K e < oo so that 



Esi(j>) - F 8l (p) - XE {p) + XF{p) 



(26) 



P e (d) < K t cxp 2 ( - d (XE (p) - E si (p) - e) j (27) 

Hence, with suitable choice of bias, the error exponent with delay can be 

E(X) = E rijscsi {\) 4 SU p \E {p) - E sl (p) (28) 
pe[o,i] 

By assuming the side information to be independent of the source, we once again have a scheme for joint source- 
channel coding. The error exponent achieved is the joint source-channel equivalent of Gallager's random coding 
exponent ([16], Problem 5.16). 

E rJac (X) = max XE (p) - E s {p) (29) 
pe[o,i] 

The exponent of ( |29l is lower in general than the joint source-channel exponent of Csiszar [37]. 

C. Random variable of computation 

Theorem 3 (Computation of stack decoder with side information): Suppose that the decoder has access to 
the side information and there is a rate R noiseless, binary channel between the encoder and decoder. Fix any 
7 £ [0, 1]. For the encoder/decoder of Section Hl-BI if the bias G satisfies 



1 + 7 - Gs4 ( 7)<G < 1 + ^ 



jR-F si ( 7 ) 



(30) 



7 7 

then the j th moment of computation is uniformly finite all for i, i.e. 3 K < oo such that V i, E[N^] < K, if 

i?>^lM (31) 

7 

As a conclusion of the theorem, we show that the interval of viable bias values implicit in|30]is in fact non-empty 

ifR>E si ( 7 )/j. 

By restricting to the point-to-point case, we see that the ~{ th moment of computation, for 7 e [0, 1], can be finite 

if 

R > (32) 

7 
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This result has been known for 7 = 1, due to Koshelev [12]. We conjecture that Theorem [3] remains true for 7 > 1. 
This conjecture is supported by simulation, but unproven. It is established by using the results found in [28] that 
if R < E s (j)/j, then E[N?] cannot be uniformly bounded. Together, these results tell us that our stack decoder 
is doing as well as could be hoped for any sequential decoder in terms of the moments of computation for the 
point-to-point case. 

Theorem 4 (Computation of stack decoder for joint source-channel coding with side information): Suppose 
there is a channel W between the encoder and the decoder and side information is available to the decoder. Fix 
any 7 6 [0, 1]. For the encoder/decoder of Sections Ill-Bl and IH-CI if the bias G satisfies 



1 + 7 

7 



G si ( 7 )-AG( 7 ) 



r, 1 + 7 

< G < 

7 



AF(7)-F s . i ( 7 ) 



(33) 



then the 7*^ moment of computation is uniformly finite all for i, i.e. 3 K < 00 such that V i, E[N?] < K, if 

A£ (7) > ^i(7) (34) 

Again, in the appendix, we show that if E (-f) > £^(7), then the interval of acceptable bias values in 1331 is 
non-empty. 

By removing the side information, we see the condition needed for a finite 7*" moment of computation, for 

7 6 [0, 1] is: 

A£ (7) >£ s ( 7 ) (35) 

The condition of ( f35l > has a matching converse once again, which can be found in [28]. 

In section IVI-GI it is shown that the error exponent is positive when the bias is set as suggested in Theorems [3] 
and [4] Hence, the decoder is actually decoding correctly and the average computation is not finite simply because 
the Stack Algorithm is blindly following an incorrect path. 

D. Proof Outline 

The proofs are the source coding analog of the proofs of Theorems 2 and 3 of [14]. We give a proof outline 
for Theorem Q] for the point-to-point lossless source coding case, as the important ideas are all present without the 
excess notation. 

Assume that G < ^ E [E s (p) - F s (p)]. We will show that for any e > 0, there is a K e < 00, 



P e (d) < K e exp 2 ^-d( P R-E s (p) - e) J (36) 

We can assume pR — E s (p) — e > 0; otherwise there is nothing to prove. 

The error event of interest, Fd, is referred to in [14] as a failure event of depth d and is defined in d37T i and we 
will relate it to P e (d) at the end of the proof. Figure IIII-DI shows paths that may lead to an error event of depth 3 
occurring, i.e. F3. 

F d = \Bu(, ui=f ui T(uf) > min r(u^) and parities of uf match Bf R \ (37) 
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The event Fd can be subdivided into events Fd.k so that Fd = (Jfc=i ^d,*> where 

= 1 3 uf , ui ^ ui T{uf) > r(u^) and parities of uf match (38) 

Let Uf be the random vector of the first d source symbols and let uf be an arbitrary vector in U d . By conditioning 
on the source sequence and applying the union bound, we get 

(39) 



P{F d ) = E Q{ui)P{F d \Ut = ui) 

ufex d 

d 



(40) 



fe=i ui 



Suppose uf is a false path that causes Fd.k to occur. This means its parities match the received bits and its metric 
T{uf) is at least T{u\). Therefore, 

< T(ui)-T(u1) 
'Q{ufi 



(41) 
(42) 



= E lo &(^rr)+ E lo &^) + (rf-fc)G 

Now, denoting l(-) as the indicator function of its argument, and using a Gallager-style union bound, for p 6 [0, 1] 
we have 

, p 



P(F dik \U? = nf) < E 



^ E] causes Fd.k to occur) 

I E] E causes i 7 ^ to occur) 

v — d — I 



ut = ui 



Uf = ui 



(43) 
(44) 

E A k (ui,ui)Y (45) 

if, Ui^tll! 

Here d44i > follows from Jensen's inequality. Continuing with the bounding, we use the fact that the parity generation 
process is independent of everything else to get 



A k (uf, ui) = E Imparities of uf match Bf K ) ■ l(T(uf) > T(uf)) 



dR\ 



ut = ui 



E 



Imparities of uf match B( R ) E 1{T(u() > T(u%)) U( = uf 



< 



exp 2 



(_^). ex p 2 (^(r(^) -i>*))) 



Substituting for Ak(ui,ui), and removing the restriction that u\ ^ Ui, 

Q{u\) 



P(F d .k\ui) < ^exp 2 (-di?)exp 2 (^[log 2 ^i(+log 2 Q(^ +1 ) + (d-fc)Gj) 



exp 2 ( - *H + („ - Q>G) ( £ (SSE»)* )' ( £ Q, S J +1 )rf 



qk; 



(46) 
(47) 
(48) 

(49) 
(50) 



Note that we need only pairwise independence of the parities along two paths. 
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Equation ( 15(1 follows from the standard algebra of interchanging sums and products. Finally, we are ready to 
complete the bound of P(F d ). 



d 



p(F d ) < 2^ dpR+{ - d ^T^ G 

k=l 

^Q(u^(^Q(5^) P Q(4+i)( E Q(u d k+1 )^Y (51) 



"1 "1 + 1 "■k+l 

d 

p , 



k=l U* u* 

We get d52i > by noting that the m's are just dummy variables and we are free to replace them with u's. Next, we use 
the IID property of the source along with standard algebra to get to an exponential form. For example, we have 

/ \fc(i+p) 

EQ(4)^J = (53) 

= exp 2 (fc£ s (p)) (54) 

Similarly, 

(e«k^) p = (EQ^r (55) 



exp 2 



(fei^O?)) (56) 



A bit more algebra and the condition on the bias gives: 

d 



P(F d ) < Y, cx ^(- d P R+ ^-^T^- G + kE ^ + ( d - k ^ F ^) 
k=i p 

d 

= e^ 2 (d{^G + F s {p)- P R))^e^ 2 (k{E s {p)-F s {p)-^G)) (58) 

< exp 2 (d(j^ G + F s(p) - PR)) ■ rfexp 2 (d(E s (p) - F s (p) - j^-G)) (59) 
= dexp 2 (-d{pR-E s {p))\ (60) 
So, now we have for any e > 0, 

P{F d ) < K e exp 2 (-d(pR-E s (p)-e)) (61) 
A £ = max |g( : — - — > e j < oo (62) 
Note that K e < oo and is independent of d because \og 2 (d)/d goes to 0. Finally, we can prove the statement of 



the theorem. In order for a delay d or greater error to occur it must be that u% (n + d) ^ Ui for some 1 < i < n. 
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Now, assuming the bias satisfies the required condition, we have 

n-l 

P(S?(n + d) ? <) = J2 P @i(n + d) = uf , u fe+ i ^ u fc+ i) (63) 



k=0 
n-l 



< Y, P ( F d+n-k)P{u\{n + d)=vll) (64) 

OO 

< ^P(F d+fe ) (65) 

OO 

< ^^ e2 -(^)(p«-^(p)-^) (66 ) 

k=Q 

oo 

= 2 - d (p R - E s(p')-t)^2K t 2~ k{pR - Es(p ' > -' !) (67) 

fc=0 

= K- £ eBtp 2 (-d(pi2-f7.(p)-e)) (68) 

The critical step is in d64l i, which says that if the decoded path and true path agree until time k, the error event 
can be thought of as 'rooted' at time k + 1. Hence, we are reduced to the error event F^n-k- The ideas used in 
the proof of the computation bound are essentially the same. 
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Time 3 



Potentially F 3 causing paths 



Fig. 4. A ternary tree. The true source sequence is shown above. The condition u± ui selects a portion of the tree containing paths that 
could potentially cause the error event F3. 
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IV. Simulations 

The random time-varying encoder and stack decoder were simulated in software using a random number generator. 
The 'experimental' results are compared with the theory for verification. The probability of error with delay, P e {d), is 
the first quantity looked at experimentally. Since probability of error decays exponentially with delay, the logarithm 
of the probability of error decays linearly with delay. That is, 

log 2 P e (d) ~ -E(R)d 

The slope of the line on a Zog2-plot is thus the negative of the error exponent achieved by this scheme. 

Further, if we assume that the moments of computation at any time are the same as the moments of computation 
in any incorrect subtree, we can compare the Pareto exponent of the simulation to the theory. This is done by 
comparing log 2 P(C > n) versus log 2 n on a graph, where C is the number of computations performed at a time 
step. The fact that the distribution of computation is asymptotically Paretian should yield that 

log 2 P(C >n) 7log 2 ™ 

where 7 is the Pareto exponent of computation. 

A. Point to point 

Example 4.1: We explore an example of point-to-point lossless source coding that will be comparable to the case 
when side information is available at the decoder only. The source Ui is a sequence of IID £>(l/2) random bits. Vi 
are generated by passing Ui through a binary symmetric channel (BSC) with crossover probability e = ^j. In this 
example, we consider the case when the side information is available at both the encoder and decoder. The situation 
is diagrammed in Figure [6] It is clear that since V is available at both the encoder and decoder, compressing U © V 
is the same as compressing U. Figure [5] shows the relevant source coding functions for the error random variable 
U © V. Since we are just encoding the noise, the rate must be at least H(U\V) = -fffc(e) where Hf, is the binary 
entropy function. 

We experimentally estimate the error exponent with delay and Pareto exponent of computation. These are shown 
in Figures |7] and [S] respectively. Again, we see that we can achieve the random coding error exponent and the Pareto 
exponent guaranteed by theorem [3] holds. Since the bias value (0.7) is actually too high to guarantee achieving 
Er,pp(R) at rate R — 0.7, the error exponent in the experiment is somewhat surprising. However, we stress again 
that the fitting of a line to the curve is somewhat arbitrary and we cannot expect to have precise values of the slope 
beyond the first digit. 

B. Side information 

Example 4.2: We reuse the binary source example, where the side information is generated by passing the source 
bit through a BSC. The side information this time is only available at the decoder, as is shown in Figure [9] In this 
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■ Random Coding Exponent 

■ Block Coding Bound 



0.4 0.6 0.8 1 

Rate in bits 
Required rate to have a finite -y 1 moment 




Fig. 5. Functions of interest associated with a binary source with PMF (0.9,0.1). 




1 - e 



Fig. 6. An example of point-to-point source coding that can be compared to source coding with side information at the decoder. Ui are 
Bernoulli (1/2) random bits, V is U passed through a BSC with crossover probability e. The encoder sequentially bins the error sequence 
U®V. 



i/(i+p) 



i+p 



i/(i+p) 



i+p 



case, the function E S i(p) simplified] as below, 

E sl {p) = io g2 ^ (Y^QM 1 

= log 2 ^Q( U )(^QH«) 1 

= io g2 ]Ti(V/a+rt + (i- e )i/a 

D=0 

= (l + p)log 2 (e 1 /( 1 +^ + (l- e ) 1 /( 1 +' > )) 

This E S i(p) is the same as the E s (p) function that appears if the side information V is available at both the 
encoder and decoder, i.e. point-to-point coding of the error sequence. To compare to the case when V is available 



i+p) 



i+p 



(69) 
(70) 

(71) 
(72) 



This expansion is for the reviewer's convenience, it will be removed in the final version. 
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Comparison of theoretical and experimental performance in example |4.1I 




Fig. 7. Estimating E(R) for example |4. 1 1 



Random variable of computation 




E s (1) = 0.678 

Experimental Pareto Exponent = 1.2 



0123456789 

log 2 (N) 



Fig. 8. Estimating the Pareto exponent for computation for example |4. 1 1 
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1 - e 

Fig. 9. An example of lossless source coding with side information at the decoder only. Ui are Bernoulli (1/2) random bits, V is U passed 
through a BSC with crossover probability e. The encoder sequentially bins its observations of U. 





Theoretical 


Experimental 


Error exponent with delay 


0.05 


~ 0.08 


Pareto exponent of computation 


> 1 
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Conjectured Pareto exponent 
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Comparison of theoretical and experimental performance in example |4.2I 



at the encoder as well, we estimate the error exponent with delay and the Pareto exponent for computation through 
simulation in Figures [TOl and [TT1 respectively. In this simulation, the rate is once again 0.7 bits per symbol, and the 
bias is 0.7. We see nearly identical values for the error exponent and Pareto exponent of the two examples, as we 
should. 

V. Conclusion 

In this paper, a scheme was described for the problem of joint source-channel coding with side information 
available only at the decoder. If the channel is noiseless, one immediately arrives at a scheme for (almost) lossless 
compression with side information at the decoder only. The coding is done in a 'streaming' manner in the sense that 
source symbols are encoded as they arrive. The encoder consists of an infinite constraint length random time-varying 
convolutional code, and the decoder is a Stack Algorithm sequential decoder with a variable 'bias' parameter. 

Two performance measures were bounded for this system when coding IID sources over DMCs; probability of 
error with end-to-end delay and (average) computational effort of the decoder. We showed that various analogs of 
Gallager's random coding error exponent could be achieved by suitable choice of bias. We also bounded the p th 
moment of computation for < p < 1. We thus established a lower bound for the cutoff rate for moments up to 
the mean for sequential decoding with side information. One would expect that a tweak to the analysis of [28], 
allowing for side information, would establish the matching upper bounds on the cutoff rate. 
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Fig. 10. Determining E(R) for example 14.21 



The Random Variable of Computation 
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-14 - 

E.(1) = 0.6 

_ I Experimental Pareto Exponent = \.2 , , , 

0123456789 10 
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Fig. 1 1 . Determining the Pareto exponent for computation for example 14.21 



Following the work of Koshelev [12], it may be possible to even allow for finite memory Markov sources. 
Another important extension would be to consider two distributed encoders as in the paper of Slepian and Wolf 
[1]; the case when the side information V is coded and required to be reconstructed. The scheme of Section fll-BI 
naturally allows for this by adding another tree code for the other source and modifying the metric update slightly. 
Simulation results have shown that the computation cost seems to be prohibitive except for high rates. Indeed, even 
the random coding exponents for correlated sources are generally much lower when both sources are coded [31]. 
Perhaps this is not surprising considering that the computational cutoff rate is closely tied to the 'Gallager' E s 
function indirectly through the random coding error exponent. 
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Fig. 12. Joint source-channel coding with side information available to the decoder. 



VI. Appendix - Proofs 

In this section we prove the theorems of the paper. First we show that the probability of error goes to zero 
exponentially with delay. This is done initially in the case when there is only a noiseless channel and the source is 
encoded at some rate R bits per time unit. Then, we prove this for joint source-channel coding with side information 
when the source and the channel are 'synchronized' at one source symbol per channel use. Next, we prove the 
theorems regarding the random variable of computation. Again, we do this first in the case of source coding with 
side information, and then for joint source-channel coding with side information. Before diving into the proofs 
individually, we first examine the error events that show upQ 

Assume that R is an integer so we need not worry about integer effects in the exposition, but the results hold 
for non-integer rates as well. Similarly, assume in the proof of Theorems [2] and |4] that A is an integer. 

A. Error events 

A source produces IID letters (f/j, Vi) according to a joint distribution Q(u,v) on a discrete alphabet WxV. 
The Ui are available to an encoder, and the Vi are given to the decoder as side information. In the case of joint 
source-channel coding, there is a discrete memoryless channel with probability transition matrix W{y\x) with finite 
input and output alphabets. We use the encoder and decoder of Section III-BI For joint source-channel coding, we 
assume there is one channel use for every source symbol. We denote vectors as uf,y™, . . . etc. We reserve the 
letters u, x, and y for the 'true' variables and u, x for arbitrary 'false' variables. 

The probability measure P will refer to all randomness in the source as well as the randomly generated encoder. 
When no confusion arises, Q will be applied to multiple symbols like u\ with the meaning that Q(ui) = 

niuo(ui)- 

The stack algorithm uses a metric, (implicity a function of the side information, tree code and channel outputs if 
there are any), of T(u) — log(Q(u\v)W (y\x(u)) / P(y)) + G for some bias Gel, where P(y) = ^2 X P(x)W(y\x). 
If there is no channel, T(u) — \og(Q(u\v)) + G if the parities of the sequence match the parities received by the 
decoder. Otherwise, we can set the metric for non-matching parities to be — oo to effectively drop them out of the 

7 The appendix is lengthy and somewhat redundant for the convenience of the reviewer and will be trimmed for the final version. 
8 For a non-integer rate, the encoder outputs either \_R\ or \K\ bits at every time instant. The integer effect is not important asymptotically, 
and for convenience we have used the integer assumption in proofs. In simulations, we have used non-integer rates. 
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stack. We now consider how the stack decoder could follow a false path. We say the stack decoder 'visits' a node 
if it computes a metric for that node. 

Suppose the true source sequence is (u™, u") until time n and w™ is some other arbitrary source sequence. Viewed 
as paths through the encoding tree, u™ and u r { are the same if and only if they trace the same path from the root 
to depth n in the tree. Also, if they are not the same, there is some earliest point at which they diverge, call that 
time n — d + 1. Equivalently, u"~ d = u™~ d , but w"~ d+1 ^ Until time n — d, because the stack decoder 

is a sequential decoder, the stack algorithm assigns it" _d and the same metric. In order for u™ to be the 

decoder's estimated path at time n, a necessary condition is: 

r(u?)> min r(«f) (73) 

n— a+l<.fe<.n 

Noting that T(u^ d ) = r(i^~ rf ), and the fact that the metric is additive, this reduces to: 

r«- d+ i) > mm d r« = ^) (74) 

All randomness in the source, encoder/decoder, and channel is memoryless and stationary, so the probability of 
the above event occurring for some false u™_ d+1 is the same as the probability of the event Fd defined below: 

F d = {3u d eU d ux^ m, T(uf) > min T(^)} (75) 

I l<k<d J 

We call Fd the error event of depth d. Figure IIII-DI shows paths that may lead to an error event of depth 3 
occurring, i.e. F3. We can further break up Fd into sub-events Fd : k so that: 

F d , k = {luf e U d \u x + ui, T(uf) > r(uj)} (76) 

d 

P(F d ) < Y, p ( Fd ^ (77) 

k=l 

P(F d , k ) = E[l(3u d eU d :u 1 ^u 1 ,T(u d 1 )>T{u h 1 ) 

i(r(u?) > r(uf)) 



< E 



(79) 



Vp G [0, 1] 



Here l(-) denotes the indicator function of its argument. The last line is in fact true for any p > 0, but it is only 
useful in bounding if p e [0, 1]. 

The probability of error with delay d at time n is P(u™~ d (n) ^ where u'l~ d (n) is the decoder's estimate 

of the source from time 1 to n — d produced at time n. We will give an upper bound on the probability of error 
independent of n and depending only on d, which is an upper bound on P e (d). 

If u™~ d (n) u™~ d , then there is some point at which they diverged, say n — d— I. So u"~ d ' l (n) = u"~ d ~\ but 
u n -d-i+i{n) 7^ u n -d-l+i-- So the probability that a false decoded path and the true path diverged at time n — d — l 
is at most P(Fd+i). Now we can use the union bound to get: 

n—d 
1=0 
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To get a bound independent of n, we just set n to infinity and get 

oo 

Pe(d) <J2 P ( F d+l) 
1=0 

As for the random variable of computation, we define a generic variable N below 



N 



= E E l(^ is visited) 

1=1 u 1 ^: U\^U\ 



(81) 



(82) 



< E_ E l^)^^^)) (83) 

By symmetry, it is clear that E[N~>] — E[N^] for all i > 1 and any 7 > 0. We want to find when E[N*>] < 00. 
By concavity, we have 



E[N~<) < E 



< E 



EE E i(r(^i)>r(4) 



oo oo 

* EE^ 

1=1 k—l L v u\: uit^ui 
00 00 



£ i(r(s«)>iK) 



EE*, 



i=l fc=l 



E 



E i(r(^)>r(^) 



(84) 
(85) 
(86) 
(87) 
(88) 



Here are some further facts/definitions that are repeatedly used in the appendix: 

1) The source and channel are memoryless. The parity generation process and channel input generation process 
are done IID for every branch/node. 

2) Jensen's inequality. If X is a random variable and / is a concave n function, E[f(X)] < f(E[X]). If 
p G [0, 1], f(x) = x p is concave 

3) By definition, for each y e y, P(y) = J2xex P(x)W(y\x). 

4) Definitions of the exponent functions E s {, Eq, etc. can be found in IIII-AI 

5) Sums and products of probabilities commute, and changing dummy variables can be used to simplify terms. 
See Gallager [16], Chapter 5. 

B. Probability of error - source coding with side information 

Theorem 5 (Restatement of Theorem Q}: Suppose that the decoder has access to the side information and there 
is a noiseless rate R binary channel between the encoder and decoder. Fix any e > and let p £ [0,1]. For the 



'Sums of the form "}2 1 mean summing over all u\ € U l . This is the meaning for all sums in the appendix, unless an additional condition 
such as u\ 7^ ui is explicitly stated. 



February 1, 2008 



DRAFT 



24 



encoder/decoder of Section IH-BI if the bias G satisfies 

1 + 



G < 



P 



E si (p) - F ai (p) 



then, there is a constant K e < oo so that 

P e (d) < K € exp 2 y - d^pR - E sl (p) - e 
Hence, with suitable choice of bias, the error exponent with delay can be 

E(R) = E r>si (R) = sup P R~E sl (p) 

P6[0,l] 



(89) 



(90) 



(91) 



Proof: The letter B will be used for the bits received by the decoder, which will be referred to as 'parities' 
We can specialize the event Fd to this situation and write it as: 

F d = (a uf , ui^ ui T(uf) > min r(u£) and parities of uf match Bf R ) 

I l<k<d ' J 

The event Fd can be subdivided into events F^.u so that Fd = [k=i Fd,k> where 

F d , k = |a uf , u x u x T(uf) > r(w^) and parities of uf match Bf R } 

Suppose uf is a false path that causes Fd,k to occur. This means its parities match the received bits and its metric 
T(uf) is at least T{u\). Therefore, 

o < r(t#)-r(uf) 

d k 

= ^(log(Q(^h))+G) -J2(^g(Q(ui\vi)) + G 



(92) 



(93) 



i=i 

k 



1 = 1 



(94) 
(95) 



5>*( 



. ( Q{ui\vi) 
Q(ui\vi) 



1=1 ^ v 1 ' l=k+l 

Using a Gallager-style union bound, for p G [0, 1], we have 



P{F d .k) < E 

(a) 



l(uf causes Fd.k to occur^ 

If: Si^xi 



(96) 

(97) 
(98) 



< I E 1 causes Fd.k to occur) 

Here, (a) is by Jensen's inequality. By conditioning on the source sequence and applying the union bound, we 
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get 



P{F d ) = Y, Q(4,vf)P(F d \ut,vf) 
ufeu d ,vfev d 

d 



i! 



fe=l 



E E Q(uiv()p(F d , k \uiM) 



(99) 



(100) 



(101) 



< E E Q( u i> v i)[ E E causes F d,k to occur) 

k=l u d ,v d u d , 5i =ix t 



(102) 



= EE^<<)( E MuiAA)) d03) 

k=l u d ,v d u d , Ui^xi 

Continuing with the bounding, we use the fact that the parity generation process is independent of everything 
else to get 



l(parities of u( match B dR ) ■ l(T(uf) > T(u^)) 
l(parities of uj match B dR )]E\l(T(uf) > T{u^)) 



uf,vf 



u d ,v d 



i-dR 



E 



uf,vf 



i(rK) > r( u *)) 

< exp 2 (-dfl) • exp 2 (s(T(ui) - r(uj)) 

Q{u k M) 



exp 2 (— dR) ■ exp 2 



( S [ l0g2 Q^W) + bg2 ^U\v d k+1 ) + (d- k)G 



for any s > 

Substituting for A k (uf , uf , vf), and removing the restriction that ui ^ u\, 



P{F d , k \u d ,v d ) < ( Eexp 2 (-di?)exp 2 (s[log : 



exp2 (-dpR+(d-k) S pG)^Y(^^yQ(ui +1 \vl +1 ) 



exp 2 (-dpR + (d - k)spG) ( E ( 



'Q(<K)y 



Eq(^ + iI^ + i) s ) (ud 



7td 
U k + 1 



(104) 
(105) 
(106) 
(107) 

(108) 



+ log 2 Q(u d k+1 \v d k+1 ) + (d-k)G\) 



(109) 
(HO) 



(6) 

Relation (b) follows from the standard algebra of interchanging sums and products. Now, we substitute the last 



Only pairwise independence of the parities along two disjoint paths is required. 



February 1, 2008 



DRAFT 



26 



line into dl03b . 



P(F d ) < E 2 ~ 



dpR+(d-k)spG 



E«(^)EE Q(u k M)Q{u d k+1 \vt + i) 



k=l 



■4 u t+i 



'(Ee(4i^i)f 



(T(- 

= j2^ dpR+id ^ k)spG T,Q^) E qk^Eq^K) 1 " 



(112) 



sp 



k=l 



E 2 

fe=i 



(E«SK)f E Q(«wKi)( E « s wi4)f (H3) 



-dpR+(d-k)spG 



(E^K)f(E*wKi) s )' 



(114) 



u k + l 



E 2 " 

fc=i 



i+p 



(115) 



E««i)(E^wi4i) 5 

U fc+1 "fc + l 

We get (c) by noting that the u's are just dummy variables and we are free to replace them with it's and then 
setting s = j^—. Next, we use the IID property of the source along with some algebra to get to an exponential 
form. For example, we have 

k k 



EQK fe )(E^K)^ 



i+p 



E---EIl^)(EE---En^H^ 

Vl Vk l — l Ui U2 

k 

nE^)(E^M^ 

E(E^ v )^ 



/=i 



111 V.2 Uk l — l 

1 + P 

1 + P 

1 + P 



1 + P 



Similarly, 



EqK)(EQ( u iK) t ^ 



i \p 



E---En^)(EE---EIl^H 

Vl V k 1 = 1 til "2 

k 

nE^)(E^(^i«o T ^ 

1=1 vi Ui 

Eqw(E ( ?( u N t ^) p 



1+P 



Hi ^2 ^A; 1 = 1 

1 \P 



(116) 

(117) 
(118) 

(119) 

(120) 
(121) 
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Using the definitions of E S i(p) and F S i(p), we can rewrite the bound as: 

d 

P{F d ) < ^exp(-dpii+(d-fc)^-G + feE si (p) + (rf-fc)F si (p)) (122) 



d 



= eX p(d(^^G + F si (p)-pR))^e X p(k(E si (p)-F si {p)-^-G) (123) 

By the assumption of the theorem, d89l ), the following condition holds 

E si {p) - F sl (p) - —y— G > (124) 
1 + p 

Then, we can simplify the bound to 

d 

P{F d ) < exp(d(^G + F sl (p)-pR)^Y, cx p(H E ^(p)~^(p)-Y^- G )) (125) 



k=l 



< eX p(d(^^G + F H (p)-pR)ydexp(d(E si (p)-F si (p)-^-G)) (126) 



= dexp{-d(pR-E si (p))) (127) 

We get (d) from noting that the sum of the geometric series can be upper bounded by d times the largest term. 
Now this holds for all p £ [0,1], so 



P{F d ) < K e e X p[-d(pR-E si (p)-e)) (128) 
logd 



K f = max I d 



{ d: ^> e } < 00 (129 ) 



Note that K e < oo and is independent of d because \n(d)/d goes to 0. We note that E S i(p) is a differentiable 
function for all p > 0, with E' si (Q) = H(U\V) (see [18]); that is, the slope at is the conditional entropy of the 
source U given the side information V. E S i(p) is the source coding with side information coding analog to Gallager's 
function Eo(p). While Gallager's function may be non-differentiable at points because it is the maximization of a 
function over probability distributions, E s i(p) doesn't suffer from this problem. 

Now, assuming the bias satisfies the required condition, we have 

oo 

P e {d) < ^P(F d+fe ) (130) 

k=0 

oo 

< ^2K e 2-( d + k )(P R - E "(p)~ e ) (131) 

oo 

= exp(-d(pi?- J B S4 (p)-e))^X e 2~ fe (" i? - f! "W- £ ) (132) 

k=Q 

Since we can choose e arbitrarily small, the geometric series converges and we have 

PJd) < ,5% , , 2 -«»C'*-B«iO»)-«) (133) 



= K e eiq ?2 ^-d(pR-E si {p)-e)J (134) 
This is true for all p £ [0, 1], so E(R) > E r<si (R) = sup pe[0)1] pR - E st (p). ■ 
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C. Probability of error - joint source channel coding with side information 

Theorem 6 (Restatement of Theorem |2): Suppose there is a channel W between the encoder and the decoder 
and side information is available to the decoder. Fix any e > and let p G [0,1]. For the encoder/decoder of 
Sections III-BI and III-CI if the bias G satisfies 

1 + P 



G < 



E sl {p) - F 8i (p) - XE {p) + XF{p) 



then, there is a constant K e < oo so that 

P e (d) < K t exp 2 f - d(xE (p) - E sl {p) - e 
Hence, with suitable choice of bias, the error exponent with delay can be 



E{\) = E rJscsi (X) = sup \E (p) - E si (p) 
pe[o,i] 



(135) 



(136) 



(137) 



Proof: We will prove this for A = 1 and then show how the proof changes for other A. As in the previous 
proof, P e (d) can be bounded by X)So F(Fd+i), and P{Fd) can be bounded by Ylk=i P{Fd,k)- So we start by 
bounding P(F c i,k)- First condition on the true source sequence, channel inputs and channel outputs. 

, p 



p(F d . k ) < Y Q{4A)Ri,4)w{y d x \xt)E[[ Y i(r(^)>rK)) 



(138) 



d d d d 

x 1 ,y 1 ,u 1 ,v 1 



(139) 



< ]T Q(uf, vf)R(x()W(yf\xf) (E [ Yl l(T(uf)>r(«f)) 

uf,v d ,x d ,yf uf£U d ,ui=iu 1 

The last step is true by Jensen's inequality. Now the only thing that is random in the expectation is the channel 
symbols used on the false u paths. 



P(F d , k ) < Y Q(uivi)R(4)W(yt\xf) ■ 



(140) 



Y £i?(z?)£[i(r(^)>rK)) 

' u d eU d , nielli xf 

Now, we also have for all s > 0, l(T(uf) > T(v%)) < exp 2 (s(r(uf) - T(v%))). So 



X l I Vl ! U l 1 V l I X l 



(141) 



r(s<)-r(«}) = io g2 mME^m _ log2 + {d _ k)G (142) 



i(r(s?)>r(uf)) < 



P(yf) P(yl) 

Q^b^iy^ls^^^i^^Q^j^j X s r(d _ k)G 

Q{u\\vi)W{yi\x k 1 )P{yi +1 ) 



(143) 
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We can substitute this expression into the inequality for P(Fd. k ). 

P(F d , k ) < £ Q^^QKl^) 1 ^^^!!^!)^^)^^!^) 1 - 8 ^^!)^^!^^!)- (144) 



i 

Now set s = 1/(1 + p). 

..d „.a ~d ..d V 



uf,vf,xf,yf 



f^Qfi^^Q^^I^^)^^^ (146) 

i / i W(v d \x d ) \ p 

]T R{x\)W{yl\x\)^R{xl +1 )W{y d k+l \x d k+ i) E R{^)W {y1\x\)^ { f±^ k+ _Y_ (147) 

To further reduce this expression, notice P{F d . k ) < A- B ■ 2T^ (d ~ fe)G , where 
A ± E (Q(vt)Q(u k M)^Q(4+i\v d k+ i)) (EO(^l«f)^0(^ilii)^)' (148) 

xf, 9 ; v p (yfe+i) 1+p y 

Now, we work on each term individually. ^4 can be written in two parts, A = A\ ■ A 2 , where A\ is the term 
corresponding to the letters from time 1 to k and A 2 is the term corresponding to letters from time k + 1 to d. 
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Explanations for steps are given after the equations. 



(a) 



(*>) 



(c) 



(d) 



(e) 



(/) 



(s) 



(h) 



m = E^EGfaM^fE^M 1 ** 

■of uj 

fl Vfc Ml «fc (=1 ^ Ul U k j=l 

E'-E^E- -En^i"^(nE^K) Ti ' 

1>1 Vfe Ul U fc ;=1 j = l Uj ' 

E---E^i)(llE^w^)(n(E^^)^ 

W 1 "fc ( = 1 U; ' ^ j = l U j 

e • • • e (n e q(mvi)^) ( n (e^ko 

k 

E---E^i)n(E^N^) 

t>l Dfc (=1 ^ Ul ' 



_l_\P 

i+p 



j = l Uj 

1+P 



E'-EII^IE^N 1 ")^ 

Ui 



Vl v k 1=1 

k / 



1+P 



1=1 \ vi 



n e«+)^ 



l+p\ 



exp 2 ( k log 2 



i+p 



a) Memorylessness of source. 

b) Sums and products commute. 

c) Same as last step. 

d) Replace dummy variables. 

e) Combine common terms. 

f) Memorylessness of source. 

g) Commuting sum and product. 

h) Dummy variable replacement, each of the k terms is the same. 
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Similarly, we work out A 2 below. 



A 2 4 



E E E e^iiii)^)' 

E E o«ii<i))(E 

,.c£ . d 7Td 

u k+l "fc+1 "fc+l 

Eo(C)-i-(E^+ii'Ci)^) 
E^h^E^^iKi)*) 

ri e^)(e«^i«o^) 

fe+i «i 



i \p 



(a) 



(b) 



(c) 



(d) 



l NP 



l \P 



1 NP 



d-fc 



Eqw(E^H u ) t ^ 



cxp 2 ( (d - fc) log 2 ^ 

'■»ev 'new 

a) The sum of the probabilities in a conditional distribution is 1. 

b) Replace dummy variable. 

c) Memorylessness of source. 

d) All d — k terms in the product are the same. 

Now use the definitions of E si and F si to write A as: 



1 NP 



A = A 1 - A 2 

= exp 2 (kE si (p) + (d - k)F si (pj^ 



(160) 
(161) 
(162) 
(163) 

(164) 

(165) 
(166) 



(167) 
(168) 



Analogously, we will write B = B\ ■ B 2 where B\ is the product of terms concerning time 1 to k and B 2 is the 
product of terms concerning time k + 1 to d. 



(a) 



(6) 



(c) 



1+P 
1+P 



Bi - E(E^)^ii a; i) T ^)(E^i)^ii^) T ^ 
E(E i? (^)^(yii a; i) T ^) 

fc 

nE(E^)^^)^ 

= 1 Vl xi 

E(E^)^w t ^) 1 

cxp 2 ( fc log 2 [ E ( E 0{x)W(y\x) ^ 



l NP 



yey xex 

a) Replace dummy variables and combine common terms. 



(169) 
(170) 

(171) 
(172) 
(173) 
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b) Use source memorylessness and commute products with sums. 

c) All k terms in the product are the same, replace the dummy variables. 
Similarly for B 2 , 



(a) 



r..d * r^d 



w{y d k+ M +1 ) 



(c) 



(d) 



(e) 



yfc+i x fc+i 
E P ^+i)(E^+i) 



c t+i 




P(y d k+ i) 



n E^^^fE^)^^) 1 ^ 

E p (y)^ E^Wyi^ 



= exp 2 (d - k) log 2 



W{y\x) 



P{y) 



a) Total probability: the sum in the first parentheses equals P(y% +1 ). 

b) Replace dummy variables. 

c) Move P(Vk + i) out of second sum. 

d) Memorylessness of channel, IID channel input generation and commute product with sums. 

e) All d — k terms are the same; replace dummy variables. 

Use the definitions of E a and F and substitute for B\ and B 2 to get: 

B = B\ ■ B<i 

= exp 2 (-fcSo(p)-(d-fc)F(p)) 
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Finally, we can put everything together: 

P{F d , k ) < A ■ B 

P 



= exp 2 I (d - k) Y^~ p G + kE sl {p) + (d- k)F si (p) - kE { P ) - (d - k)F(p) 
d 

p{F d ) < J2 p ( p ^ 



fe=l 



< ^expJcd-fc)^ 



-G + kE si {p) + (d - k)F sl {p) - kE Q (p) - (d - fc)F(p) 



(183) 
(184) 

(185) 

(186) 



exp 2 d 



G + F sl (p) - F(p) 



fe=i 



^(/9)-F SI (p)-i? (p)+F(p) 



-G 



l + P 



(187) 



Now suppose that A / 1. The only thing that would change would be that instead of d channel inputs and outputs, 
there would be Ad channel inputs and outputs. The independence of the channel and source straightforwardly gives: 

P(F d , k ) < exp 2 I (d - k)-^-G + kE sl (p) + (d - k)F sl (fi) - k\E { P ) - (d - k)\F{p) j (188) 



P(F d ) < exp 2 d 



l+P 



G + F si (p) - XF(p) 



k=l 



E sl (p) - F st (p) - XE (p) + XF(p) - 



l + P 



-G 



(189) 



Now, we assume that G < ^^[E si (p) — F si (p) — \E Q (p) + \F(p)], so that the term in the exponential in the 
sum is positive. Then the total sum can be bounded by d times the d th term in the sum. 

P(F d ) < dexp ^- d(\E {p) - E si (p)^j (190) 

The derivative at zero of Eq is I(R, W) where 



I(R,W)= P(x)W(y\x)log. 



0(x)W(y\x) 
0(x)P(y) 



(191) 

xex,yey 

and the derivative of E si at zero is H(U\V), so if H(U\V) < XI(R,W), there is some p e (0,1] so that the 
difference XE (p) — E si (p) is strictly positive. The p can be optimized to give the source-channel random coding 
with side information exponent E r ^ sc (X) — max pe [ .i] XEo(p) — E S i(p). ■ 

D. Random variable of computation - source coding with side information 

Theorem 7 (Restatement of Theorem [3): Suppose that the decoder has access to the side information and there 
is a rate R noiseless, binary channel between the encoder and decoder. Fix any 7 6 [0, 1]. For the encoder/decoder 
of Section Hl-BI if the bias G satisfies 



-G sl (~i) < G < 



jR-F si ( 7 ) 



(192) 
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then the 7" 1 moment of computation is uniformly finite all for i, i.e. 3 K < 00 such that V i, E[N^] < K, if 



R > 



Proof: Recall that 



i=i k=i 



e[w] < EE^ 
=1 

E l( r (2i)> r («i))) 



A Uk ± E 

From section IVI-BI(l97l ). we already know that if I > k 

7 



A, k < exp 2 ( (I - fe) T ^-G+ fc£ si ( 7 ) + (/ - fc) J F sl ( 7 ) - /7-R 



If Z < k, we have 

^ = E qk.^i)^ 

(a) 



E i(r(si)>rK))) 



1 E ^( ''*'■'{)( E ^[i(r(«i)>r(uf)) 















r 














. k 






r 



< Q(ui,Vi)y E 1 1 (parities of match) exp 2 ^ ^ 

u\ :ui ^ui 



T(u[)-T(u k 1 ) 



+ 7 



(193) 

(194) 
(195) 

(196) 

(197) 
(198) 
(199) 



(a) uses Jensen's inequality followed by linearity of conditional expectation. The parity generation process is 
independent on different branches of the encoding tree, and 

Q(u[\v[) 



ex P2 (r(^)-rK)) = 



so substituting gives 



< 



EQ(«)f E 



Q(u[\v[)Q(uf +1 \vf +1 ) 
Q(u{\v[) 



Q{u\\v{)Q{u>t + M + i) 



o(i-fe)G 



1+7 (i-fc)G 



-in 



(200) 



(201) 



•ex P2 ^fcj^G-h^Qtiil^^j 



= exp 2 ^(Z-fc)_L_G-Z7i?J ■ C ■ D 



(202) 

(203) 
(204) 
(205) 



u i+i> v i+i 
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The terms corresponding to letters from time 1 to I, C are the same as ( 1160b in section IVI-BI so we have 

Ai, k < exp f(l - k)j^G - IjR + lE si (j)\ ■ D (206) 
The term D can be simplified into an exponential form using G S i'. 

D ± J2 Q(vf+M u l+M + i)^ (207) 



v i+i u f+i 

k 

= n e^^e^™^) 1 ^ (2 ° 9 ^ 

=( + 1 v m u m 

v k-l 

EqmE^^) 1 ^ ) (210) 



veV uEV 

expf(fc-/)G si ( 7 ) ) (211) 



So if fc > L we have: 



A k < exp Kl-fc)^-^G-l 7 J2 + 1^(7) + (fc-0G«<(7)j (212) 

Combining the bounds gives 

00 00 

E[N~<} < EE^> fc < 213 > 

/ = 1 fc=l 

OO OO OO OO 

= EE^+E E < 214 > 

2=1 fc=/ fc=i Z=fe+1 

OO OO OO CO 

^ EE^ fc +EE A ^ < 215 > 



2=1 k—l k=l l=k 

00 00 



^ EE ex p(^- fc )irp G -^ i?+ ^^) + ( fc -^ G -(T)) + 



;=i fc=z 

OO OO 



(6) 



E E ex P ( - G + fc ^ 7 ) + ^ _ - iT-R) (216) 

k=l l=k \ + 7 / 

00 / \ 00 / \ 

Eexp -l(7R-E H (*yj) E ex P ~ ( k ~ 1 )(t^ G ~ G ^)) + 
1=1 ^ ' k=i ^ ^ ' 

OO , X OO , s 

E exp ( - fc( 7 i? - B»i(7)) ) E ex P ( " (' ~ fc ) ( - T^ G + ~ 7 R ) A 217 ) 

fc=i ^ ' i=k ^ ^ ' 

a) Substitute for A;^. 

b) Add and subtract (2 — fc) 7 i? in the exponent of the second double sum. 
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The above sums converge if the following conditions are met: 

7 i? > £ sl ( 7 ) 



7 



1+7 

7 

1 + 7 



g > G si (n) 

G < jR-F si (~f) 



(218) 
(219) 
(220) 



This concludes the proof assuming these conditions hold. To see that ( |219t and ( 1220b can be satisfied by one 
choice of bias assuming ( |218t , see section IVI-FI ■ 

E. Random variable of computation - joint source channel coding with side information 

Theorem 8 (Restatement of Theorem |4): Suppose there is a channel W between the encoder and the decoder 
and side information is available to the decoder. Fix any 7 £ [0, 1]. For the encoder/decoder of Sections IlI-BI and 
IH-CI if the bias G satisfies 



1 + 7 



G sl ( 7 )-AG( 7 ) 



< G < 



1 + 7 



AF( 7 )-F S ,( 7 ) 



(221) 



then the 7 moment of computation is uniformly finite all for i, i.e. 3 K < 00 such that V i, E[N^] < K, if 



(222) 



Proof: Again, we will show this for A = 1 and at the end see how it changes for A + 1 1. Recall that 



00 oc 



A, 



l.k 



1=1 k=l 



E 



£ l[T{u\)>T{u k ) 
From (11381 > in section IVI-CI we already know that if I > k, 



A lM < exp (I - k)-^—G + kE si ( 7 ) + (I - k)F si {j) - kE (j) ~ H ~ k)F(-y) 
1+7 



If I < k, we have 

A, k = 



(a) 
< 



< 



(*>) 



]T Q{u k ,v k )R(x k )W{y k \x k )E 
£ Q{u k ,v k 1 )R{x k 1 )W{y k \x k )E 
Q(ulv k )R(x k )W(y k \x k )E 



J2mu[)>r(u k )) 

^i(r(^)>rK)) 



x k ,y k ,u k ,v k 



x\,y k ,u\,v k 



E 



exp 



exp 



t(u[) - rK) 
1 + 7 

r(^)-rK) 
1 + 7 



1 7 



x\,y k ,u\,v k 



x k ,y k ,u k ,v k 



Y QK ! ^)i?(^)^(y 1 fe i^)f5:5:i?^ 1 )exp^ r( "' ) rK) 



1 + 7 



(223) 
(224) 

(225) 

(226) 
(227) 
(228) 
(229) 
(230) 
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Then, we work with G and D individually to get: 

C = E Q^+i)Q^ + M+i)^ (237) 



*I+i>*M-i 



k-l 



Y,Q(v)J2Q(u\v)^) (238) 



D = J2 ^f+^^+ikf+i) 1 ^^!)^ (239) 



c l + l'Vl + l 



(240) 



= E^)E^4) rS1 +i) 1+7 ™ 



fi + l x i + l 

So finally for k > I, using the definitions of G si and G gives 

Afc = cxp^(/-fc) T ^-G + / J B sl (7) + (fc-0G sl ( 7 )-^o(7)-(fc-0G(7)^ (243) 

Now we split the double sum in the bound of £?[iV 7 ] and use the two cases of I, k to get: 

00 00 

e[w] < EE^ < 244 > 

;=i fe=i 

00 00 00 00 

= EE A ^+E E ^ < 245 > 

;=i fc=; fc=i ;=fc+i 

00 00 00 00 

< EE A ^ + EE^ fc < 246 > 

1=1 k=l l=k 

00 00 / \ 



^ E E ex p ( (' - fc )ir^ G + lE «w + { - k - l ) G ^) - lE oh) - ( k - og( 7 ) J + 



fc=i ;=fc 



^ Eexp ( (I - k)y^—G + kE H fr) + k)F sl { n ) - kE ( 7 ) -(I- fc)F( 7 ) j (247) 



00 00 / \ 

= cx p ( l (E si (i) - ^0(7))) E ex p ( z - k) -rr G + {k ~ l)Gs ^ ~ (fc ~ ?)G(7) (248) 
;=i ' k=i \ 7 / 

00 00 ✓ \ 

+ ]r ex p (*(^«(7) - ^0(7))) E cx p ( (' - k ^rr G + (* - fc ) F -(7) - a - fc)^d) ) 
fe=i ^ 7 / 

Now, if A ^ 1, we would instead have 

00 00 / \ 

E[N"<] < ^cxp(?(ii; s ,(7)-A J Bo( 7 )))E cx P (i-fe)Y^-G+(fe-0G fli (7)-A(fe-/)G( 7 ) (249) 



Z=l fc=i 



OO OO y \ 

+ E ex P (*(^»(7) - A£o(7))) E ex P ( (' - k ^rh G + ( Z - fc ) F -(7) - A(Z - fc)F( 7 ) J 
fe=i i=fe ^ 7 ' 

February 1, 2008 DRAFT 



39 



The above sums converge if the following conditions are met: 



(250) 
(251) 
(252) 

Condition ( 12501 is effectively the requirement that the source coding computational cutoff rate for the r y th moment 
is lower than the channel coding cutoff rate for the 7 th moment. This is needed in this case even though we are 
using joint source-channel coding. Conditions ( 12511 ) and ( 12521 ) combined require 





< 


A£ (7) 




7 G 
1 + 7 


> 




AG( 7 ) 


i 7 G 
1 + 7 


< 


AF( 7 ) - 





1 + 7 



G si ( 7 )-AG( 7 ) 



< G < 



1 + 7 



\F( 7 )-F si ( 7 ) 



(253) 



F. Showing the range of viable bias values is non-empty 

Fix a 7 G [0, 1]. For each v E V and y G y define H(v) and J(y) as: 

H(v) = ^<3(w|v)^ (254) 



m * |> (I)( B*>y« (255) 

If we consider V to a random variable with distribution Q(v) on V and Y to be a random variable with distribution 
P(y) on then by definition we have the following relations: 

E^) = \og 2 E[H{V) 1+ i] (256) 

F sl ( 7 ) = \og 2 E[H(Vr] (257) 

G s< ( 7 ) = logaBf^V)] (258) 

So (7) = -log 2 S[J(r) 1+ ^] (259) 

F( 7 ) = -log 2 E[J(Yy] (260) 

G( 7 ) = -log a £[J(Y)] (261) 

By repeated use of Jensen's inequality, since 7 G [0, 1], we also have 

E[H(V) 1+ ~<} > E[H(V)]E[H{V)\ i (262) 

> E[H{V)]E[H{V)~<] (263) 
E[J(Y) 1+ ^} > E[J(Y)]E[J(Y)r (264) 

> E[J(Y)]E[J(Yy] (265) 
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Since log 2 is a monotonically increasing function, this means: 

E H (rf) > F S4 ( 7 ) + G sl ( 7 ) (266) 

E (rr) < F( 7 )+G( 7 ) (267) 

Now, if 7 i? > E si (j), then 

1 R~F st { 1 )-G st { 1 ) > E H (<y) - F.i(rf) - G H (*y) (268) 

> (269) 

Hence, (—^■G s i( i y), -^-(jR — F s j( 7 ))) is a non-empty open interval of bias values that give a finite 7 t/l moment 
of computation if jR > E S i ( 7 ) as shown in section IVI-DI 

For the joint source-channel case, we assume Ai?o( 7 ) > E s i{pf). Then, 

AF( 7 ) + AG( 7 )-F S4 ( 7 )-G sl ( 7 ) > XEafr) - E si (~/) (270) 

> (271) 

Hence, there is a non-empty open interval of allowable bias values in Theorem |4] if Ai? ( 7 ) > i? S i( 7 ). 

G. Error exponent with bias set for computation 

In this section, it is shown that if the bias can be set to achieve a 7 t/l moment of computation while still allowing 
for a positive error exponent. 

In the source coding with side information case, assume jR > i5 s , ( 7 ), then we know (Thm.[T|i that for all e > 0, 
there is a K € < oo so that 

P e {d) < Ke 2- d ^ R - E " { ^-^ (272) 
This is provided that the bias G satisfies 

G< i±^[£ SJ ( 7 )-F S4 ( 7 )] (273) 

7 

Also, from Thm. [3] the j th moment of computation is finite provided 

ii^G sl ( 7 ) < G < i±^[ 7 i? - F si (y)} (274) 
7 7 

Suppose the bias is set so that G* = ^ 1 [E S i( , y) — -F s j( 7 )]. Then there is a positive error exponent with delay. 
It is also true, however, that this choice of bias yields a finite j th moment of computation. Since we assume 
7 i? > Esifr), it is immediate that G* < ^-["fR - ^(7)]. 

For the other inequality, we need that the log function is strictly concave n. This combined with the assumption 
that U is not deterministic given v G V for at least one V gives the strict inequality below: 

G si ( 7 ) + F si ( 7 ) <£ si ( 7 ) (275) 

Hence, G* > i±2 G si ( 7 ) if the source U is not deterministic given v for at least one value of v 6 Vl I 

"if U is conditionally deterministic given v for all v £ V, obviously the source coding with side information problem is not interesting as 
zero rate is needed. 
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For the joint source-channel coding with side information case, an analogous line of reasoning gives that the 
choice G* = -||p- (7) — F si ("f) — \E (~/) + XF^)] gives a positive error exponent and finite j th moment of 
computation provided £^(7) < So (7). 
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